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1.  Introduction 

1.1.  Background 

In  the  course  of  vulnerability  modeling  or  any  other  computer  simulation  activity,  the  need 
arises  to  quantify  certain  aspects  of  the  behavior  of  the  model  itself  as  an  autonomous  system, 
This  is  distinct  from  use  of  the  model  as  a  predictive  or  analytic  tool,  in  that  the  simulation  mod¬ 
el  itself  is  now  the  subject  of  study.  In  the  words  of  Iman’  el.  al. : 

it  is  important  to  have  efficient  techniques  to  ex,. .nine  and  assess  the  influence  of 
model  input  i)n  model  output.  That  is,  it  is  important  to  be  able  to  perform  sensitivity 
analyses  on  the  relationship  between  information  supplied  to  the  model  and  predic¬ 
tions  made  by  the  model.  The  benefits  of  such  analyses  include  the  following:  1 )  an  in¬ 
dication  whether  the  model  operates  as  intended,  2)  identification  of  unimportant  vari¬ 
ables  or  unnecessary  model  complexity,  and  3)  an  assessment  of  relative  input  variable 
importance  for  guidance  in  data  collection.” 

A  directive  to  analyze  the  behavior  of  the  compartment  model^  for  the  purpose  of  determining 
the  relative  importance  of  its  input  variables  had  led  to  the  application  of  methodology  pres¬ 
ented  in  this  report.  The  techniques  presented  herein  are  applicable  to  the  sensitivity  analysis 
of  computer  simulations  in  general,  and  consideration  should  be  given  to  their  incorporation 
into  such  analyses. 

1.2.  The  Basic  Problem 

Conceptually,  the  vector  input  x  and  scalar  output  y  of  a  simulation  model  are  functionally  re¬ 
lated  by  y  =  F(x)  where  the  function  is  unknown.  Of  interest  is  the  local  sensitivity  of  the  mod¬ 
el  {i.e.y  the  relationship  between  changes  in  x  and  changes  in  y  when  x  is  centered  about  a  single 
fixed  operating  point  with  input  Xo  and  output  y,,  =  F(X(,)). 

Flic  local  Taylor  scries  representation  of  F  at  x,,  is 

F(x)  =  F(Xo)  +  ^F(x„)‘Ax  -t-  Ax'^F(x„)Ax  +  ...  [1] 

where  Ax  =  x  -  Xi,  is  the  incremental  change  in  x  about  the  operating  point.  Vectors  arc  col¬ 
umns,  and  throughout  tliis  presentation  A‘  denotes  the  transpose  of  A.  TVuncating  the  Taylor 
series  lo  first  order,  we  obtain  the  approximation 

F(x)  »  F(x„)  +  |^F(x„)‘Ax  [2] 


or 


Ay  =  b'Ax 


[3] 


where 


Ay  =  y  -  yo  =  F(x)  -  F(xo)  [4] 

is  the  incremental  change  in  y  about  the  operating  point  y,,  and  the  derivative  vector 

b  =  ^F(x„)  relates  Ay  to  Ax.  TheconiponentsoFb  thus  quantify  the  sensitivity  of  the  moilel 

to  changes  in  input  and  allow  us  to  an.swer  questions  about  the  relative  importance  of  the  vari¬ 
ous  input  dimensions, 

Let  us  suppose  now  that  we  have  a  number  of  observations  (AXj,  Ay|),  each  representing  a 
slight  variation  in  the  model  input  and  output  about  the  operating  point.  We  may  construct  a 
vector  Y  with  component  Yj  equal  to  Ayv  and  a  matrix  X  with  row  i  equal  to  Ax|.  Noting  that 
b^Ax  I  =  Axfb  ,  the  collection  of  equations  [3]  may  be  written  succinctly  as 

Y  =  Xb  15| 

which  expresses  the  problem  of  estimating  h  in  the  language  of  linear  regression. 


1.3.  Design  Considerations 

The  problem  here  is  to  estimate  the  tangent  plane  of  a  multidimensional  surface  at  a  single 
point,  We  assume  that  the  true  revsponse  is  a  “nice”  function  ( i.e.,  differentiable,  smooth,  con¬ 
tinuous,  etc.),  .so  that  the  response  is  locally  linear,  given  small  enough  variation  in  the  input. 
We  assume  here  that  the  analyst  has  control  over  the  design  of  this  experiment.  These  ques¬ 
tions  arise: 

•  What  is  the  operating  point? 

•  How  many  observations  are  needed? 

•  What  kinds  of  variation  in  the  input  need  to  be  considered? 

•  What  is  the  best  way  to  specify  the  design  matrix  X? 

For  example,  suppose  that  the  input  space  has  dimension  two  and  that  we  arc  interested  in  the 
effect  of  ±  10%  variation  in  the  input  values.  Using  ±  2%  increments  on  the  variables  gives 
a  set  of  1 1  values  for  each  input  dimension,  namely,  { -10%,  -8%,  -6%,  -4%,  -2%,  0%,  2%,  4%, 
6%,  8%,  10%  },  Call  this  set  of  values  S.  Constructing  all  possible  pairs  of  the  values  gives  a 
total  of  1 1  ^  =  121  points.  The  design  is  the  product  set  S  x  S,  and  a  design  point  is  a  pair  of 
numbers  x=(xi,  Xj).  The  design  matrix  X  has  121  rows,  each  consisting  of  a  distinct  pair  x  from 
S  X  S.  In  two  dimensions,  we  can  graph  the  design: 
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Figure  1.  Full-Grid  Design 

This  scheme  exercises  all  possible  combinations  of  the  inputs  and  thus  provides  good  coverage 
ot  the  input  space  at  the  expense  of  a  large  number  of  design  points.  Note,  that  in  five  dimen¬ 
sions,  this  design  requires  11-^  =  161051  poinis. 

An  alternative  is  to  vary  only  one  dimension  of  x  at  a  time,  leaving  the  others  fixed  at  the  oper¬ 
ating  point.  With  the  same  set  of  values  S  as  above,  the  number  of  points  required  here  is 
1  +  lOp,  where  p  is  the  dimension  of  x .  Here  we  gain  information  only  along  the  x  coordinate 
axes  and  not  in  the  off-axis  regions.  Predictions  derived  from  such  a  model  will,  in  general,  only 
be  valid  when  one  of  the  quantities  varies  ±  10%  and  the  other  is  fixed  at  zero.  Suchadesignis 
unacceptable  if  one  wishes  to  make  predictions  based  on  both  quantities  having  a  variation  in 
the  ±  10%  range  simultaneously. 
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Figure  2.  Axis-Only  Design 

A  design  which  provides  complete  coverage  of  the  input  space  and  also  offers  control  over  the 
number  of  points  is  more  useful  than  the  full-grid  and  axis-only  alternatives.  The  Latin  Hypci  - 
cube  Sampling  (LHS)  design-^  has  these  desirable  characteristics.  Consider  pairing  two  ran¬ 
dom  permutations  of  the  base  set  S  to  generate  a  design  with  1 1  points.  This  procedure  is  the 
basis  of  LHS. 


Table  1.  LHS  with  11  Points  in  2  Dimensions 
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Figure  3.  LHS  with  II  Points  in  2  Dimensions 
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2.  The  Latin  Hypercube  Sample 

In  full  generality,  the  Latin  Hypercuhe  Sample  allows  free  choice  of  the  number  of  design 
points  (henceforth  denoted  by  n),  the  dimension  of  the  input  space  (p),  the  marginal 
probability  distribution  of  each  of  the  p  input  variables,  and  the  correlation  structure  of  the 
input  space. 

2.1.  Definition 

A  p-dimensional  Latin  I  lypercubc  Sample  (d'si/e  n  is  formed  as  follows:  ’ 

Divide  the  range  of  each  variable  into  n  bins  based  on  equal  width  or  equal  probability. 

For  each  variable,  select  a  point  at  random  from  each  bin.  Then  randomly  order  the 
points  for  each  variable  and  combine  them  to  form  p-tuples.  The  resuiting  collection  of 
n  vectors,  each  of  length  p,  is  a  Latin  Hypercube  Sample. 

We  begin  by  examining  the  simplest  case  and  proceed  to  develop  gencrtility  by  presenting 
examples  of  the  more  involved  constructions. 

2.2.  Discrete  Uniform  Distribution 

The  simplest  case  is  a  one-dimensional  (1-D)  LHS  of  size  n  drawn  from  the  discrete  uniform 
distribution.  Without  loss  of  generality,  we  can  take  the  allowable  variable  values  to  be  the  set 
N=  {0,  l,2,...,n-l }.  The  LHS  is  thenaranelom  permutation  ofN.Forexample,withn'=I()we 
have: 


Tbble  2.  Discrete,  Uniform,  l-D  LHS 


Xl 

Xj 

Xj 

X5 

x<. 

X7 

Xd 

x^ 

Xlll  j 

X 

7 

(1 

y 

t) 

1 

8 

i) 

5 

Independent  I-D  samples  are  ct)mbincd  tt)  form  highcr-dimensiimal  samples: 


Table  3.  Discrete,  Uniform,  2-D  LHS 


Xl 

X,' 

X.i  X,i 

X? 

X(, 

X7 

Xd 

Xl 

.Xlll 

7 

0 

2  .I 

() 

1 

8 

9 

.S 

4 

X; 

0 

1  V 

3 

7 

5 

4 

8 

6 

In  the  LHS,  there  is  no  connection  between  the  number  of  sample  points  and  the  dimension  of 
the  input  space,  in  contrast  with  the  previously  considered  designs, 

2.3.  Continuous  Uniform  Distribution 

Now  we  consider  a  sample  with  uniform  distribution  on  the  unit  interval,  U(0,1 ).  Consiruction 
of  this  sample  is  based  on  the  discrete  sample  of  the  previous  section.  For  p= 1 ,  the  1  -D  case,  an 
LHS  of  size  n  is  constructed  as  follows: 


6 


a.  Generate  a  random  permutation  of  the  set  N  =  {  0,  1, 2, n-1 

b.  Add  a  U((),l)  random  quantity  to  each  element  of  N. 

c.  Divide  by  n  to  scale  the  sample  into  the  interval  (0,1). 

For  example,  with  n=  10,  we  have  the  results  in  Table  4. 


Table  4.  Development  of  Contlnuniis,  Uniform,  I-D  LHS 


step 

Si 

Xi 

X.1 

X/, 

\ 

'ill 

■i 

7 

0 

2 

5 

6 

1 

8 

9 

5 

4 

u 

7.690 

0.526 

2.564 

5.884 

6.558 

1.751 

8,667 

9,654 

5.816 

4.272 

H 

0.769 

0.055 

0.236 

0.5,88 

0.656 

0.173 

0,867 

0,965 

0.582 

0.427 

Note  that  we  effectively  divide  the  allowable  variable  range  (0,1)  into  n  cquiprobable  bin.s 

(0,0,1),  (0. 1,0,2) . (0.9,1);  order  the  bins  randomly;  and  then  select  a  point  from  each  bin. 

again  with  equal  probability. 

A  2-D  LHS  is  formed  by  generating  independent  1-D  samples  for  each  variable.  Adding 
another  dimension  to  the  previous  example  gives  the  results  in  Table  5. 


llible  5.  Continuous,  Uniform,  2>D  LHS 


X| 

Xj 

Xj 

X4 

Xj 

x? 

Xh 

X.( 

X|0 

X, 

0.769 

(1,053 

0.236 

0.588 

0.656 

0,175 

0,867 

0,965 

(1,582 

0.427 

X2 

0.215 

0,041 

0.115 

0.929 

0.562 

0.746 

0.56(1 

0,485 

(1.868 

0.609 

Higher-dimensional  samples  are  formed  by  generating  independent  samples  in  each 
dimension. 

2.4.  Arbitrary  Dl.stributions 

The  LHS  examples  generated  previou.sly  have  the  U(0,1)  uniform  distribution  in  each 
dimension.  Transformation  to  other  continuous  distributions  can  be  accomplished 
independently  in  each  dimension  by  applying  the  appropriate  invcr.se  probability  integral 
transform  (inverse  cumulative  distribution  function).  The  argument  is  reproduced  here: 

Let  the  random  variable  U  have  the  uniform  distribution  on  the  unit  interval.  The 
cumulative  di.stribution  function  (cdt)  of  IJ  is  ProbfU  <;  t}  =  t  for  0  s  t  s  1.  We 
wish  to  transform  U  into  a  random  quantity  X  with  a  specified  cdf  F(x).  So  take 
X  =  F-‘(U).  Then  Prob{X  i  x}  =  Prob(F-'(U)  s  x}  =  ProbfU  s  F(x))  =  F(x) 
as  desired. 

2.5.  Orthogonal  Design  In  Linear  Regression 

The  design  matrix  X  in  a  linear  regression  problem  (equation  [5])  is  said  to  be  orthogonal  if  the 
product  X'X  is  a  diagonal  matrix.  The  variables  (columns)  of  such  a  design  are  then 
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uncorrelated.  In  the  statistical  literature,  the  term  multicollineariiy  refers  to  a  departure  from 
orthogonality.  On  one  hand,  orthogonality  is  an  absolute.  Either  a  matrix  is  orthogonal  or  it 
isn’t.  In  contrast,  use  of  the  word  multicollinearity  is  intended  to  suggest  some  degree  of  linear 
dependence  among  a  set  of  vectors.  Thus,  multicollinearity  is  subject  to  quantification  and 
comparison.  Common  measures  of  multicollinearity  include  variance  inflation  factors,  the 
determinant,  various  types  of  matrix  metrics,  and  various  definitions  of  the  condition  number, 
Further  discussion  of  nuilticollincarity  is  deferred  to  section  2.7,  where  several  of  these 
measures  are  defined  and  used. 

One  of  the  computational  benefits  of  an  orthogonal  design  is  that  calculation  of  the  parameter 
estimate  for  one  of  the  variables  involves  only  that  particular  column  of  the  design  matrix 
(along  with  the  dependent  variable),  so  variables  can  be  added  or  deleted  form  the  design 
scheme  without  recalculating  all  estimates.  Likewise,  a  single  column  can  be  changed  and  the 
corresponding  parameter  can  be  re-estimated  independently  of  the  others.  A  second 
advantage  of  the  orthogonal  design  is  the  optima!  variance  property,  which  essentially  stales 
that  parameter  estimates  have  minimum  variance  when  the  design  is  orthogonal.’’  Practically 
speaking,  this  corresponds  to  reduced  error  estimates. 

2.6,  Correlation  and  Correlation  Conditioning 

The  rank  correlation  of  any  continuous  LHS,  whether  it  be  drawn  from  the  uniform 
distribution  or  an  arbitrary  distribution,  is  identically  equal  to  the  rank  correlation  of  the 
underlying  discrete  uniform  LHS  from  which  the  sample  was  obtained.  So,  all  inquiries 
concerning  the  rank  correlation  of  an  LHS  can  be  addrc,s.sed  by  considering  the  discrete 
uniform  case. 

Ideally,  the  variables  (/'.c.,  columns)  of  an  LHS  .should  be  uncorrelaled,  a.s  they  were  generated 
indepeiulently.  In  practice,  of  C('urse.  lhc.so  vectors  exhibit  nonzero  LH>rrelation,  F’or  example, 
here  is  an  integer  LHS  with  10  observations  and  5  variables: 
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5 

3 

6 

2 
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4 

7 

8 
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6 
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Tb  an  accuracy  of  two  decimal  places,  this  sample  has  rank  correlation: 
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tfx  = 


'1.00  n.07  -0.39  -o.  u)  -0.1. r 

.  1.00  -0.01  0.94  0.33 

.  1.00  0.31  0.08 

1.00  0.15 

1.00 


PI 


The  ideal  correlation  .structure  of  such  a  .sample  should  be  the  identity  matrix  I  (i.c.,  distinct 
variables  shiiuld  be  uncorrclated).  In  the  1-D  case,  we  divide  each  element  by  the  standard 
deviation  (square  root  of  the  variance)  of  the  sample  to  scale  the  sample  variance  to  unity. 
Analogous  procedures  can  be  used  in  higher  dimensions  to  produce  uncorrelated  vectors.^’ 
One  way  of  accompli.shing  this  “decoupling"  in  the  multidimensional  case  is  presented  here. 

Let  S  be  the  .sample  variance-covariance  matrix  of  X.  The  diagonal  elements  of  S  arc  the 
sample  variancc.s  of  the  input  vectors,  and  the  off-diagonal  elements  are  the  sample 
covariances.  Let  T'T  be  the  Cholesky  decomposition^  of  S,  Then  T  is  upper-triangular  and 

T'T  =  S.  Let  Q  =  T“‘  and  consider  the  quantity  XQ.  Applying  standard  idcntitic.s^ 
concerning  the  variance  of  multivariate  random  quatititius,  we  have 

var(XQ)  «  Q'  ■  var(X)  •  Q 
=  Q'  S  Q 
=  Q'  •  T'T  •  Q 
=  (TQ)'TQ 
=  ri 

-  I  [8] 


so  the  product  XQ  has  unit  variance.  We  have  “divided”  X  by  the  “square  root”  of  its  variance  to 
produce  an  object  with  the  required  variance  I.  The  resulting  correlation  sii  uciure  will  also  be 
I.  In  this  case, 


S 


TT 


'9.167  0.611  -3.611  -2.722  -1.167' 

0.611  9.167  -0.056  5.833  3.056 

-3.611  -0.056  9.167  2.8.33  0.722 

-2.722  5.833  2.833  9.167  1.389 

-1.167  3,056  0.722  1.389  9.167 


T 


'3.028  0.202  -1.193  -0.899  -0,385' 

0  3.021  0.061  1.991  1.037 

0  0  2.782  0.589  0.072 

0  0  0  2.012  -0.529 

0  0  0  0  2.767 


[10] 
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and 


Q  =  T  ' 


"0.330  -0.022  0.142  0.128  0.075" 

0  0.331  -0.(K)7  -0.325  -0,186 

0  0  0.359  -0.105  -0.029 

0  0  0  0.497  0.095 

0  0  0  0  0,361 


‘2.642 

1.148 

3.983 

2.359 

0,647 

1.321 

1.567 

2.329 

-0.151 

0.230 

1,651 

3.200 

4.232 

11.309 

1.871 

0,330 

2.295 

3.326 

1.873 

2.710 

1,982 

0.530 

3.354 

1.368 

2.782 

2.973 

2.781 

1.573 

0.601 

3.060 

0.661 

0.287 

2.433 

-0.204 

2.051 

3.303 

0.772 

2.477 

0.981 

2.101 

0.991 

1.920 

1.101 

2.694 

1.351 

2.312 

2.494 

2.374 

0.853 

0.573 

Now  let  each  column  of  the  matrix  Y  contain  the  ranks  of  the  data  in  the  corresponding  column 
of  XQ,  TVansforming  data  to  ranks  changes  variance  but  not  rank  correlation.  This  operation 
yields  an  integer  matrix,  each  column  of  which  can  be  viewed  as  a  permutation  of  the 
corresponding  column  of  the  original  matrix  X.  The  result  is: 


"8  4  9  9  3" 

4  5  3  2  1 

5  10  U)  3  5 

17  7  8  8 

6  2  8  7  9 

9  9  2  4  10 

2  15  16 

10  3  6  6  7 

3  6  1  10  4 

7  8  4  5  2 


which  now  has  rank  correlation 


(13| 


'l.OO  0.07  0.07  0.07  -0.13" 

.  1.00  -0.09  -0.02  -0.05 

.  1.00  0.07  0.12 

1.00  0.08 

.  1.00 


[141 


Compare  this  with  the  rank  correlation  of  the  original  sample  X  (equation  [7|) .  The  effect  of 
such  a  transformation  is  not  entirely  obvious,  as  most  observers  are  not  able  to  visualize 
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higher-dimensional  objects.  Certainly  some  of  the  offensive  correlations  have  decreased,  but 
several  of  the  off-diagonal  elements  of  the  correlation  structure  have  increased  in  magnitude, 
lowcvcr,  there  arc  a  number  of  ways  to  measure  multicollincarity  in  the  sample. 


2.7.  Measures  of  Performance 

Perhaps  the  ntost  common  scalar  measures  of  multicollincarity  assic'ated  with  a  design  matrix 
are  the  condition  number  and  determinant  measure. 

The  cigenvaliics  Xj  of  the  ideal  correlation  structure  arc  all  ct|iial  to  1 .  The  dclerminant  of  a 
matrix  is  the  product  of  its  eigenvalues,  Hence,  the  determinant  6  of  the  ideal  structure  is  also 
equal  to  1 .  A  condition  number  x  may  be  defined  at  the  ratio  of  largest  to  smallest  eigenvalues 
Xi/Xj,This  quantity  wl!  ,li;  the  1  in  the  ideal  case.  Finally,  we  may  consider  the  norm  e  (al.so 
called  the  Euclidean  ti.  r-.  or  root  meaii  .square  distance)  between  the  sample’s  correlation 
structure  .and  the  ideal  identity  matrix.  This  quantity  should  be  zero.  Refer  to  the  first  two  lines 
of  Table  ft  for  mea.sures  associated  with  the  samples  X  and  Y. 

By  all  indications,  this  procedure  has  imp“oved  (decreased)  the  correlation  of  the  sample. 
Note  that  the  product  XQ  indeed  has  exact  unit  correlation  and  that  the  final  step  of  replacing 
columns  of  XQ  by  column  ranks  again  disturbs  the  correlation  structure.  It  is  natural  to 
consider  iterative  application  of  the  procedure  in  hopes  of  obtaining  a  "limiting”  .sample  with 
the  “most  ideal”  correlation  under  the  constraint  of  replacing  columns  with  column  ranks.  SVe 
can  repeat  the  procedure  and  generate  a  sample  Y+  from  Y  in  the  same  manner  that  Y  was 
generated  from  X.  The  details  are  not  reproduced  here,  but  another  application  of  the 
procedure  permutes  four  elements  in  last  column,  and  the  result  is 


■ «  4  y  y  2  ■ 

4  5  3  2  1 

5  10  10  3  5 

1  7  7  8  H 

0  2  8  7  0 

0  0  2  4  10 

2  15  17 

10  3  6  6  0 

3  6  1  10  4 

7  8  4  5  3 


The  resulting  correlation  .structure  is 


‘1.00  0.07  0.07  0.07  0.02' 

.  1.00  -0.09  -0.02  -0.03 

.  l.OO  0.07  0.04 

.  1.00  -0.03 

1.00 


[16] 
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The  associated  measures  of  multicollinearity  are  presented  in  Tabic  6. 


Thbic  6.  Measures  of  MaUicollineurity 
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1 

1 

1 

1 
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For  this  particular  sample,  the  process  has  terminated.  Another  application  does  not  change 
the  sample. 

2.8.  The  General  Effect  of  Correlation  Conditioning 

We  can  demonstrate  the  effect  of  correlation  correction  by  considering  this  simulation: 

•  Generate  1,000  LH  samples,  each  with  100  observations  in  5  variables, 

with  no  correlation  correction. 

•  Generate  1,000  LH  samples,  each  with  100  observations  in  5  variables. 

with  a  single  step  of  correlation  correction. 

•  Generate  1,000  LH  samples,  each  with  100  observations  in  5  variables, 

with  completed  correlation  correction. 

•  Comp.ite  and  compare  the  cumulative  probability  distributions  of 

x,  the  condition  numbers, 

ft.  the  determinants  of  the  empirical  correlation  structures  y,  and 
f  .  the  1.^  norms  |g-1 1 2 
for  each  of  the  three  set:'i  of  samples. 

Figures  4  through  6  depict  the  empirical  distributions  of  x,  ft,  and  r.  Neither  the  single  example 
nor  the  .simulation  provides  proof  of  the  effectiveness  of  the  correlation  correction  procedure, 
but  the  indication  is  that  single  correction  substantially  improves  the  behavior  of  the  sample, 
and  that  completed  correction  further  improves  the  behavior  of  the  sample. 

Table  7  details  results  from  the  simulation  in  the  form  of  empirical  quantiles  (q)  for  each 
measure  of  multicollinearity  (x,  6,  and  e)  at  each  level  of  correlation  correction  (none,  single, 
and  complete).  Such  tabulations  facilitate  quantitative  observations  about  the  distributions 
under  study.  For  example: 

Note  that  of  Xo  lies  above  1 .349,  whereas  99%  of  Xi  lies  below  1 .127.  T  his  is 

complete  separation  of  distributions,  for  all  practical  purposes.  ALso,  85%  ofxi 
lies  above  1 .046,  whereas  99%  of  x,„  lies  below  1 .046.  Empirically,  this  indicates  a 
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probability  of  0,85  that  the  condition  number  of  an  LHS  with  single  correlation 
correction  exceeds  1.046,  and  a  probability  of  0.01  that  the  condition  number  oi  an 
LHS  with  completed  coirelation  correction  does  not  exceed  1 ,046.  This  may  be  an 
important  difference  in  practical  applications. 


Figure  4.  Condition  Numbers 
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Tuble  7.  Quuntilcs  of  Pcribrniunci:  Measures 


■ 

1  Condition  Number  (x) 

Determinant  (6) 

Norm  (c.) 

Hii 

Hi 

Hex. 

ft,, 

ft, 

fto. 

— 

i^i 

ss 

1% 

1,349 

1.033 

1.025 

0.7913 

0.9956 

0.9994 

0.011 

0.001 

5% 

1.435 

1,040 

1,028 

0.8369 

0.9971 

0.9994 

0.048 

0.004 

o,o;)2 

10% 

1.489 

1.043 

1.029 

0.8523 

0.9977 

0.9995 

0.086 

0.009 

0,005 

15% 

1.531 

1.046 

1.031 

0.8637 

0.9979 

0.9995 

0.12/ 

0.013 

0,007 

20% 

1.563 

1.049 

1.031 

0.87 13 

0.9982 

0.9995 

0.158 

mull 

25% 

1.601 

1,051 

1.032 

0.8784 

0.9983 

0.9995 

0.205 

0.022 

0.012 

30% 

1.630 

1,053 

1,033 

0.8851 

0.9985 

0.9996 

0.247 

0.026 

0,014 

35% 

1,658 

1,056 

1.034 

0.8927 

0.9986 

0.9996 

0.284 

0.030 

0,017 

40% 

1.687 

1.058 

1.034 

0.8981 

0.9987 

0.9996 

0.330 

0.036 

0.021 

45% 

1,710 

1.060 

1,035 

0.9039 

0.9987 

0.9996 

0.370 

0,042 

0,023 

50% 

1.739 

1.063 

1.036 

0.9092 

0.9988 

0.9996 

0.427 

0,047 

0,025 

55% 

1,766 

1.065 

1.036 

0,9139 

0.9989 

0,9996 

0.473 

0.029 

60% 

1,801 

1.069 

1.037 

0,9190 

0.9990 

0.9996 

0.545 

0,059 

65% 

1,836 

1,071 

1.037 

0.9231 

0.9991 

0.9996 

0.597 

0.064 

0,036 

70% 

1.866 

1.074 

1.038 

0.9286 

0.9991 

0,9997 

0.657 

0,070 

0,041 

75% 

1,909 

1.077 

1.039 

0.9335 

0.9992 

0.9997 

0.721 

0,078 

0,045 

80% 

1.956 

1.080 

1,040 

0,9384 

0.9993 

0.9997 

0.801 

0,087 

0,050 

85% 

2.014 

1,087 

1.041 

0.9437 

0.9994 

0.9997 

0,891 

0,099 

0,056 

90% 

2.084 

1,094 

1,042 

0,9511 

0.9994 

0,9997 

0.992 

0,115 

0.062 

95% 

2.183 

1.103 

1.044 

0.9590 

0,9995 

0.9998 

1.178 

0.141 

0.072 

90%. 

2,445 

1.127 

1.046 

0,9998 

1.640 

0.196 

0.101 

key  0 :  no  correlation  correction 

1 :  single  correlation  correction 
CO  \  completed  correlation  correction 
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2.9.  Inducing  Correlation  in  the  Sample 

We  can  modify  the  procedure  outlined  in  section  2.6  to  induce  a  desired  correlation  in  the  Latin 
Hypercube  Sample. 

Again,  we  begin  with  an  integer  LHS  X  having  sample  variance-covariance  matrix  S,  As 
before,  TT  i.s  the  Cholcsky  decomposition  of  S,  and  Q  =  T  “  ' .  Now  suppose  that  the  desired 
correlation  structure  of  the  sample  is  C.  Let  R'Rbc  the  Cholcsky  decomposition  of  C,  and 
consider  the  product  XQR. 

var(XQR)  =  (QR)'  •  var(X)  •  QR 
=  R'Q'  •  S  •  QR 
=  R'Q'  •  T'T  •  QR 
==  R'  (TQ)'TQ  R 
=  R'  I  I  R 
=  R  R 

C,  [17| 


Now  XOR  has  variance  (and  hence  correlation)  exactly  equal  to  C.  Let  each  column  of  the 
matrix  Y  contain  the  ranks  of  the  data  in  the  corresponding  column  of  XQR,  Continue  a.s 
before,  treating  Y  as  the  new  sample  and  iterating  the  procedure.  The  benefits  observed  in  the 
unit-correlation  case  (section  2.8)  carry  through  to  the  arbitrary-correlation  case. 


17 


3.  Some  Uses  of  the  Latin  Hypercube  Sample 

3.1.  Local  Sensitivity  Analysis:  Single  Operating  Point 

The  most  basic  use  of  LHS  is  to  model  small  variation  in  input  around  a  single  operating  point 
using  an  uncorrelated  uniform  sample.  This  is  dc.scribed  in  detail  in  the  Introduction  of  this 
report  and  illustrated  in  Figure  3. 

3.2.  Local  Sensitivity  Analysis:  Multiple  Operating  Points 

Suppose  now  that  wc  are  interested  in  the  local  sensitivity  of  our  model  at  a  number  of  operat¬ 
ing  points,  For  the  sake  of  illustration,  lake  p=2.  Consider  three  operating  points,  say, 
Xj  =  (1,2),X2  =  (2,  l),andx3  =  (4,3).  We  Impose  ±  10%  variation  on  the  inputs  and  gener¬ 
ate  25  perturbations  of  each  operating  point.  Initial  operating  points  are  indicated  by  “  -t- "  and 
LHS  points  by  in  Figure  7, 
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Figure  7.  Multiple  Local  Operating  Points 

3.3.  Global  Sensitivity  Analysis 

We  may  desire  to  “connect”  the  space  between  operating  points  and  develop  a  global  model  of 
the  simulation  under  study.  In  this  case,  it  may  be  appropriate  to  induce  a  particular  correla- 
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tion  in  the  LHS  to  provide  sampling  in  desirable  regions.  Refer  to  Figure  8.  Initial  operating 
points  are  indicated  by  “+”  and  LHS  points  by  Suppose  we  choose  to  regard  the  points 
( 1 ,5)  and  (5, 1 )  as  infeasible  and  the  points  (1,1)  and  (5,5 )  as  realistic  extensions  of  the  operat¬ 
ing  space.  The  sample  in  Figure  8  was  generated  with  a  correlation  of  0.7  between  dimensions, 
and  it  apparently  conforms  to  this  notion  of  feasibility.  The  uniform  distribution  was  used 
here,  but  other  distributions  may  be  appropriate  depending  on  the  application,  Note  that 
changing  marginal  distributions  through  use  of  the  inverse  probability  integral  transform  docs 
not  change  the  rank  correlation  of  the  sample,  as  the  mapping  is  monoionic  increasing. 


Figure  8.  Global  Analysis 
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4.  Conclusions  and  Recommendations 

The  Latin  Hypercube  Sample  is  appropriately  used  to  generate  input  for  simulation  model 
sensitivity  analyses.^  Consider  the  linear  regression  problem  (equation  [5])  developed  in  the 
introduction  of  this  report.  It  is  well  known^^  that  the  variance  of  the  parameter  estimate  in¬ 
creases  as  the  correlation  of  the  input  variables  increases,  the  ideal  (mmimum  variance)  case 
being  that  of  uncorrelated  inputs,  Various  ways  of  quantifying  multicollinearity,  or  departure 
from  orthogonality,  have  been  suggested.  These  including  the  condition  number  and  determi¬ 
nant  measures.  Statisticians  agree  that  a  design  with  minimal  correlation  among  the  input  vari¬ 
ables  is  desirable.  However,  as  Stuart  and  Ord' '  ptritit  out,  the  word  nunimal  in  this  context 
does  not  have  a  unique  interpretation; 

Stewart ...  presents  several  indices  for  assessing  multicollinearity;  the  ensuing  discus- 
sion  indicates  the  lively  debate  that  persists. 

Correlation  correction  in  Latin  Hypercube  Sampling  reduces  popular  measures  of  multicolli¬ 
nearity.  This  increases  the  efficiency  of  subsequent  statistical  procedures.  Therefore,  the 
correction  should  be  applied  when  efficiency  is  an  i.ssue. 

The  effects  of  higher  correlation  are  amplified  when  dimensionality  of  the  sample  is  high  and 
the  number  of  points  in  the  sample  (cardinality  of  the  sample)  is  low.  Schemes  which  use  a 
large  number  of  high-dimensional,  low-cardinality  samples  may  particularly  benefit  from 
completed  correlation  correction. 
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